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Abstract 

In a recent breakthrough, [Bshouty et al., 2005] obtained the first passive- 
learning algorithm for DNFs under the uniform distribution. They showed 
that DNFs are learnable in the Random Walk and Noise Sensitivity models. 
We extend their results in several directions. We first show that thresholds 
of parities, a natural class encompassing DNFs, cannot be learned efficiently 
in the Noise Sensitivity model using only statistical queries. In contrast, we 
show that a cyclic version of the Random Walk model allows to learn ef- 
ficiently polynomially weighted thresholds of parities. We also extend the 
algorithm of Bshouty et al. to the case of Unions of Rectangles, a natural 
generalization of DNFs to {0, . . . , b — l} n . 

Keywords: Thresholds of parities, PAC learning, random walk model, statistical 
queries. 

1 Introduction 

Learning Boolean formulae in Disjunctive Normal Form (DNF) has been a central 
problem in the computational learning theory literature since Valiant's seminal pa- 
per on PAC learning (25). In lfT2l . it was shown that DNFs can be learned using 
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membership queries, a form of active learning. Jackson's algorithm, also known 
as Harmonic Sieve (HS), uses a clever combination of two fundamental techniques 
in learning, Harmonic Analysis and Boosting. The use of Harmonic Analysis in 
the study of Boolean functions was introduced in lfT5ll . It was subsequently used 
as the basis of a learning algorithm for AC circuits in ll2"UI . The Harmonic Anal- 
ysis used in the HS algorithm is based on a parity-finding algorithm of Goldreich 
and Levin ifTUll . which was first applied to a learning problem by Kushilevitz and 
Mansour |[T9l . Hypothesis boosting, a technique to reduce the classification error 
of a learning algorithm, was introduced by Schapire | f23j . The boosting algorithm 
used by HS is actually due to Freund Q. 

In a recent breakthrough, Bshouty et al. obtained the first passive learn- 
ing algorithm for DNFs. Their algorithm is based on a modification of HS which 
focuses on low-degree Fourier coefficients. That variant of HS, called Bounded 
Sieve (BS), was first obtained in |@]. In 0, BS was used to learn DNFs under the 
uniform distribution in two natural passive learning models. The first one is the 
Random Walk model, where examples, instead of being i.i.d., follow a random 
walk on the Boolean cube (see also for related work). The second model 

is the closely related Noise Sensitivity model, where this time examples come 
in pairs, the second instance being a noisy version of the first one. The results 
of are interesting in that they give a learning algorithm for DNFs in a case 
where the observer has no control over the examples provided. However the prob- 
lem of learning DNFs under the uniform distribution when examples are i.i.d. still 
remains open. It is known that DNFs cannot be learned in the more restrictive Sta- 
tistical Query model (introduced in ifTolO where one can ask only about statistics 
over random examples . 

Jackson [IT21 also showed that HS applies to thresholds of parities (TOP), a 
class that can express DNFs and decision trees with only polynomial increase in 
size, and extended his algorithm to the non-Boolean case of unions of rectangles, 
a generalization of DNFs to {0, . . . , b — l} n (where b = 0(1)). Whether those 
classes of functions can be learned in the Random Walk and Noise Sensitivity 
models was left open by Q. Our contribution is threefold. We first show that 
TOPs cannot be learned in the Noise Sensitivity model using statistical queries 
(SQs) 1 . As far as we know, this is the first example of a negative result for 
"second-order" statistical queries, i.e. queries on pairs of examples. This does 
not rule out the possibility of learning TOPs in the Random Walk model although 
it provides evidence that the techniques of cannot be easily extended to that 

'|5| uses only SQs. 
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case. On the other hand, we show that a simple variant of the Random Walk 
model where the component updates follow a fixed cycle allows to learn TOPs 
efficiently. This seems to be the first not-too-contrived passive model in which 
TOPs are efficiently learnable with respect to the uniform distribution. Actually, 
one can perform the Harmonic Sieve in this Cyclic Random Walk model, and we 
also show that this model is strictly weaker than the active setting under a standard 
cryptographic assumption. Finally we extend the techniques of @ and J5J to the 
non-Boolean domain {0, . . . , b — l} n and use this to learn unions of rectangles in 
the Noise Sensitivity and Random Walk models. This last result turns out to be 
rather straightforward once the proper analogues to the Boolean case are found. 

In Section El we introduce the learning models and give a brief review of 
Fourier analysis. The negative result for learning TOPs is derived in Section 
The learning algorithms for TOPs and Unions of Rectangles are presented in Sec- 
tions |4] and Irrespectively. 

2 Preliminaries 

We briefly review the learning models we will use and some basic facts about 
Fourier analysis. For more details see e.g. ifTHI and ll2"2"l. 

2.1 Learning Models 

Let b E N be a nonzero constant and let [b] = {0, . . . , b — 1}. Often we will 
take b = 2. Consider a function / : [b] n — > {1,-1}, that we will call the target 
function. Think of / as partitioning [b] n into positive and negative examples. 
Denote by U the uniform distribution over [b] n . The goal of the different learning 
problems we will consider is generally to find for e > an ^-approximator h to / 
under the uniform distribution, i.e. a function h such that 2 

V x ~u[h(x) f(x)] < e. 

To achieve this, the learner is given access to limited information which can take 
different forms. 

The Membership Query (MQ) model allows to ask for the value of / at any 
point x of our choosing. The Uniform Query (UQ) model on the other hand 

2 For convenience, we will drop the notation x ~ U from probabilities and expectations when 
it is clear that x is uniform. 



3 



works as follows: at any time the learner can ask for an example from / and 
is provided with a pair (x, f(x)) where x ~ U; all examples are independent. 
This type of model is called passive — contrary to the MQ model which is called 
active — because the learner has no influence over the example provided to him. 

In 0, two variants of this model were considered. In the Random Walk (RW) 
model, one is given access to random examples (x, f(x)) where the successive 
values of x follow a random walk on [b] n . Many choices of walks are possible 
here. We will restrict ourselves to the case where at each step, one component 
of x, say Xi, is picked uniformly at random and a new value y for Xi is picked 
uniformly at random over [b] (the first example is uniform over [b] n ). A related 
model is the Noise Sensitivity (NS) model. Here a parameter p e [0, 1] is fixed 
and when an example is asked, one gets (x, y, f(x), f(y), S) where x ~ U and 
y = Mp{x) is a noisy version of x defined as follows: for each component of x 
independently with probability 1 — p a new uniform value over [b] is drawn for 
this component (we call this operation updating and we call 1 — p the attribute 
noise rate), otherwise the component remains the same 3 ; S is the set of updated 
components. We will consider one more variant of these passive models. In the 
Cyclic Random Walk (CRW) model, the successive examples x follow a random 
walk where at each step, instead of picking a uniformly random component to 
update, there is a fixed cycle (z'i, . . . , i n ) running through all of {1, . . . , n} and 
components are updated in that order (the first example is uniform over [b] n ). In all 
the previous models except MQ, examples are drawn randomly and we therefore 
allow the learning algorithm to err with probability 1 — 5 for some S. 

The UQ and NS models also have a Statistical Query (SQ) variant. Here, one 
does not have access to actual examples. Instead in the case of UQ for instance one 
can choose a polynomial-time computable function T : [6] n x {1, — 1} — > {1,-1} 
and a tolerance r £ [0, 1] which is required to be at least inverse polynomially 
large and the UQ-SQ oracle returns a number 7 such that 

|E[r(x,/(x))]- 7 | <r. 

Therefore, the learner can ask only about statistics over random examples. This 
can be simulated in polynomial time under the UQ model using empirical aver- 
aging. But the UQ model is strictly more powerful than UQ-SQ [16|. In the case 
of NS, the function T is allowed to depend on x, y, f(x), f(y). This is called a 
second-order statistical query. 

3 Note that a component is allowed to remain the same even if it is updated. 
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We will work with three classes of functions. First, we consider Boolean 
formulae in Disjunctive Normal Form (DNF), in which case b = 2. A natural 
generalization of DNFs to b > 2 was given in lfT2l : for each 1 < i < n, choose 
two values < U < Ui < b — 1, and consider the rectangle 



An instance of UBOX is a union of rectangles. Note that in the Boolean case, a 
DNF can be seen as a union of subcubes of [2] n . The class of thresholds of parities 
(TOP) applies only to b = 2. A TOP is a function of the form 



for M vectors E [2] n and weights w m E Z. It is assumed that the weight sum 
J2m=i \ w m\ is of size polynomial in n. 

We will be interested in learning function classes under the uniform distribu- 
tion. For any model Ai, any function class C and any 5, e > 0, we say that C is 
(5, e)-learnable in Ai if there is an algorithm A such that for any function f E C 
with probability at least 1 — 5, A finds an ^-approximator to / in time polynomial 
in the description size of /. We say that C can be weakly learned under Ai if there 
is 5 > and e of the form | — pol *^ such that C can be (5, e)-learned in Ai. 

2.2 Fourier Analysis 

The complex-valued 4 functions on [b] n form a linear space where a natural inner 
product is given by 



where * denotes complex conjugation. The set of all generalized parities (parities 
for short) 

/ \ a i x i 

where a E [b] n and uj b = e 2m ^ h form an orthonormal basis and any function can 
be written as a linear combination 



[l,u] = {x E [b] n : k <Xi < Ui, Vz'}- 




X 




(1) 



ae[b] n 



4 In the Boolean case, we actually consider only real-valued functions. 
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where the Fourier coefficient /(a) is K[f(x)x a {x)]. A useful result is Parseval's 
identity 

E[|/(x)| 2 ] = l/>)! 2 - 

06 [6] n 

In learning problems, Fourier-based algorithms usually estimate some of the 
Fourier coefficients and build an approximation to / in the form of a linear com- 
bination as in <[TJ (and then take the sign or something slightly more complicated 
in the case b > 2). There are two main cases where this technique tends to work. 
In the "low-degree" case, most of (or at least a non-negligible part of) the Fourier 
mass is concentrated on low-degree terms, i.e. terms /(a) where a has few non- 
zero components. Then one can estimate all low-degree terms, which can lead 
to a subexponential algorithm. This is the idea behind the algorithm for learning 
AC circuits in ll20l . In the "sparse" case, most of the mass is concentrated on 
a few terms. Then one needs to find a way to determine which terms should be 
estimated. This is the idea behind the algorithm for learning decision trees in ifT^ll . 

Because one often needs to estimate expectations, e.g. Fourier coefficients, 
using empirical averages, it is customary at this point to recall Hoeffding's lemma. 

Lemma 1 (Hoeffding). Let Xi be independent random variables all with mean fi 
such that for all i, c < Xj < d. Then for any A > 0, 

P 



1 m 

m z — ' 

i=l 



fi 



> X 



< 2e 



-2\ 2 m/(d-c) 2 



3 Negative Result for TOPs Learning 

For this section, we fix b = 2. As demonstrated in |[T6ll and Q, a nice feature of 
the SQ model is that it allows a complete unconditional characterization of what 
is learnable under this model. We prove in this section that parities cannot be 
weakly learned in the Noise Sensitivity model with attribute noise rate at least 
^('"g") (this includes the constant noise rate case used in 0). This implies in 
turn that TOPs cannot be weakly learned in this model. Our lower bound on the 
noise rate is tight for this impossibility result. Indeed, it is easy to see that for an 
attribute noise rate of ° gn \ one can actually learn parities. This follows from 
the fact that at such a rate, there is a non-negligible probability of witnessing an 
example (x, y, f(x), f(y), S) with exactly one bit flip from x to y, which allows 
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to decide whether the updated variable is contained or not in the parity. One can 
then repeat for all variables (this can be turned into a statistical query test). In this 
section, y = N p (x) with x uniform unless stated otherwise. 

We follow a proof of The main difference is that we need to deal with 
second-order queries. 

Lemma 2. Any SQ, F(x,y, f(x), f(y) ), in the NS model can be replaced by simple 
expectations, Ist-order queries of the form K[g(x)f(x)] (where x is uniform), and 
Ind-order queries of the form ~E[h(x, y)f(x)f(y)] where f is the target function 
(this actually applies to any second-order SQ model). Moreover, we can assume 
\g(x)\ < 1 and \h(x, y)\ = 1 for all x,y G [2] n . 

Proof. Say we are trying to learn the function /. Because / takes only values —1 
and +1, we have 



E[T(x,yJ(x)J(y))} = E 



E r ( „»,*,, 1 (l±^W)(l±^W) 



ij=+l,-l 



l - (nr(x,y,i,j)]+iM x [f(x)E y [T(x,y,i,j)]] 
t,j=+i,-i 

+jE y [f(y)E x [T(x, y, i, j)]} + ijM[f(x)f(y)T(x, y, 



□ 

Note that the lst-order queries may not be computable in polynomial time 
because the averages over x, y are exponential sums (although they might be esti- 
mated in polynomial time). But this is not a problem because what we will show 
is that, no matter what the complexity of the queries is, the number of queries has 
to be superpolynomial. Note also that the simple expectations do not require the 
oracle (assuming the distribution of x, y is known, as is the case in the NS model). 
So we ignore them below. Finally, note that in the NS-SQ model, expectations are 
unchanged if the roles of x and y are reversed. 

Following Lemma El we can think of a weakly learning algorithm as mak- 
ing a polynomial number of 1st and 2nd-order queries. Denote by s the size of 
the target function. Say the algorithm A makes p(n, s) queries with tolerance 
l/r(n, s) and outputs an (| — ^ ^ ) -approximator, where the queries are a collec- 
tion of functions {(9i' s (x) J K' s (x,y))} p i t{ s) overx,y G [2} n with \g?' s (x)\ < 1 
and \h™' s (x, y)\ = 1 for all x,y G [2]™. We now characterize weakly learnable 
classes in NS-SQ (the characterization actually applies to any second-order SQ 
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model). For this proof, we assume that the confidence parameter 5 = (but see 
the remark after the proof). 



Lemma 3. Let r'(n,s) = max{2r(n, s), q(n, s)}. Denote by C n,s the class of 
functions in C restricted to instances of n variables and size at most s. If C is 
weakly learnable under NS-SQ ( using an algorithm with parameters described 
above), then there exists a collection {V njS \ n ^>i with V n , s of the form 

with \k™ ,s (x)\ < 1 and y)\ = lfor all x,y G [2] n , andp'(n, s) < p(n, s) + l 

such that, 

V/ G C"> s , % \E x [f(x)k?> s (x)}\ + \E{f(x)f(y)ir(x,y)}\ > (2) 

t [n, s) 

Proof. We start with V n>a = 0. We simply simulate the weak learning algorithm 
A with an oracle that returns the value to each query. Every time A makes a 
query, we add that query to V n>s . At one point A stops and returns the hypothesis 
a. We add (a, 1) to V n:S . It is clear that p'(n, s) < p(n, s) + 1. Assume that © is 
not satisfied. Then there is a function / such that 

\E x [f(x)kr(x)]\ < , \nf(x)f(y)ir(x,y)]\ < , 

r(n, s) r{n,s) 

for all i. Therefore, in our simulation, the zeros we gave as answers to the 
queries were valid answers (i.e. within the tolerance r( l , ) and therefore because 

A returns a weak approximator, it has to be the case that a is a (| — ^^y)- 
approximator. This implies that 

\E[f(x)a(x)}\ + \E[f(x)f(y)l}\ > \E[f(x)a(x)}\ > > 

q{n, s) r{n,s) 

a contradiction. □ 

As noted in J4|, because the previous proof does not rely on the uniformity 
of the learning algorithm and because BPP C P/poly, the proof also applies to 
randomized algorithms. 

Theorem 1. The class of parity functions cannot be weakly learned in NS-SQ with 
attribute noise rate UJ ^ logn \ 

n 
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Proof. Because the size of the function is bounded by a polynomial in n, we drop 
s from the previous notations. Suppose to the contrary that there is an algorithm 
A with parameters as described above that weakly learns parities. By Lemma |3j 
we have for all a £ [2] n 



(K[xa(x)kUx)}+E 2 [xa(x)xa(y)lU^y)] 



> 



i=l 



(r'(n)y 



Taking expectation over uniform a £ [2] n , this is 

p'(n) 

J2^aK[Xa(x)kUx)]] + J2V 2 [Xa(x)Xa(y)lU^y)}} > 
i=l i=l 

Then either 

p'{n) 



{r'(n)y 



E E «[ E '^( S )W]]>^^, (3) 



i=l 



or 



p'(n) 

J]E Q [E 2 [ Xa (x)x a (z/)/r(x,y)]] > ^y^. (4) 



i=l 



In case ®, we get a contradiction by following the same steps as in [@J Theorem 
34], which we do not repeat here (their k becomes n and their p becomes |). 
The attribute noise rate does not play a role in that case. Below, we derive a 
contradiction out of @, which follows a similar argument. 
From © there is an i such that 

X = E a [E 2 [lUx,y)xa(x) X a(y)}} > (5) 

p'{n){r'{n)) z 

Taking (u, v) to be an independent copy of (x, y), we also have 

J = E a [E[inX,y)Xa(x)Xa(y)]E[inU,v)Xa(u)Xa(v)}} 

= E^ y) [E (u>v) [l?(x, y)l?(u, v)E a [ Xa (x ®y®u® v)}]], 
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where © is the parity operator. Denote 71 = x © u and 72 = y © v. Recall that 
\l»(x, y)\ = l for all 1,1/6 [2]". Then 



111 < E 



7i>72 



|E a [xo(7 1 ©72)]|] 



^71,72 [l^a [X7i©72( a ) 

E 7l [P 72 [ 7l = 72 ]] 
1 



E 



71 



P 2 + i(i-/: 

1 



l-(l-p) + ^(l-p) 2 



This last term is the inverse of a superpolynomial if (1 — p) 
contradicts ©. 



u;(log n) 



, which 
□ 



In the case of constant attribute noise rate, the proof actually implies that even 
the parities over the first u;(logn) variables cannot be weakly learned. 



4 Harmonic Sieve in Cyclic Random Walk Model 

In this section, we show that HS can be performed efficiently in the CRW model. 
We also prove that CRW is strictly weaker than MQ under a standard crypto- 
graphic assumption. 

Theorem 2. The algorithm HS can be performed in the CRW model with a poly- 
nomial increase in time (and an arbitrarily small probability of error). 

As an immediate corollary we get the following. 

Corollary 1. For any 5, e > 0, DNFs, TOPs and UBOXs are (5, e)-learnable in 
the CRW model. 

The proof of Theorem |2] follows. 

Proof. We only need to check that we can estimate the sums of squares of Fourier 
coefficients appearing in the Goldreich-Levin algorithm. Without loss of general- 
ity, we can rename all components of x so that the components are updated in the 
order (n, n — 1, . . . , 1). For 1 < k < n and a E [b] k , let 

C a , k = {f(ad) : d e [b} n - k }, 
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where ad is the concatenation of a and d. Then Jackson |[T2l showed that it is 
enough to estimate within inverse polynomial additive tolerance the sum of the 
squares of terms in C a ,k which he also shows to be equal to 

L\{c a , k ) = Yl / 2 M = nMf*(yx)f(zx)xa(y - *))], 

de[b] n - k 

where x E [b} n ~ k , y E [b] k and z E [b] h are independent uniform, and y — z is taken 
to be the difference in Z^. In the CRW model, this estimation can be achieved 
through the following simulation. Make n queries to obtain a uniform instance. 
Then make n — k queries to update the last n — k bits and get yx and f(yx). Then 
make k more queries to update the first k bits and get zx and f(zx). It is clear 
that x,y,z are as required above. From this, compute Re(f*(yx)f(zx)x a (y — 
z)). Repeat sufficiently (polynomially) many times and apply Hoeffding's lemma. 
This takes 2n times as many queries as in the MQ model. The rest of the HS 
algorithm applies without change. Note, in particular, that the boosting part does 
not require membership queries (see also |@J Theorem 21]). Note also that we 
didn't assume that / is Boolean above. □ 

Theorem 3. If one-way functions exist, the CRW model is strictly weaker than the 
MQ model. 

Proof. We proceed as in [;5 , Proposition 2]. If one-way functions exist then there 
exists a pseudorandom function family {/ s : [2] n — > {1, — l}} ae /i ) _ 1 \n ifTTl . Con- 
sider the function g s which is equal to f s except on inputs of the form e« (i.e. the 
vector with O's everywhere except on component % where it is 1) where the func- 
tion is defined as Sj. Then using membership queries, one can learn s from queries 
to g s and therefore one can learn g s . On the other hand, in the CRW model, with 
probability 1 — 2~ n ( n \ one never sees instances e/s. Therefore if it were possible 
to learn g s in this model, this would be essentially equivalent to efficiently learn- 
ing f s in the MQ model (by simulation of the conditioned walk) which leads to a 
contradiction. □ 



5 Learning Unions of Rectangles 

The purpose of this section is to extend the DNF learning algorithm of @ in the 
Noise Sensitivity model to the [b] n setting. The learning algorithm of [5] proceeds 
in a fashion similar to that of lfl2l except that it uses weighted sums of squared 
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Fourier coefficients (related to the so-called Bonami-Beckner operator) and con- 
siders only O ( log n) -degree terms. Therefore the main task in extending this al- 
gorithm to UBOXs is to define an appropriate substitute for the Bonami-Beckner 
operator and show that low-degree terms are also sufficient in this case. The latter 
was proved by Jackson fT2l Corollary 17]. We tackle the former problem in the 
following theorem. 

Theorem 4. For any 5, e > 0, the class of UBOXs is (5, e)-learnable in the Noise 
Sensitivity model, and therefore in the Random Walk model as well. 

Proof. We seek to generalize the weighted sum of squared coefficients used in 0- 
A requirement is that it must be possible to estimate the partial sums correspond- 
ing to fixing O(logn) components in the Noise Sensitivity model. A natural 
choice seems to be 

(T p f)(x)=E y=K{x) [f(y)], 

where recall that M p {x) is a noisy version of x where each component is updated 
independently with probability 1 — p. Here p is a fixed constant. Because the op- 
erator T p is linear, it suffices to compute its action on the basis functions. Denote 
by | S | the cardinality of S C {1, . . . , n} and by \a\ the number of nonzero com- 
ponents of a E [b] n . For a vector x and a set S, we note x s the vector x restricted 
to components in S, and OscXs signifies the vector which has O's on components 
in 5' c and is equal to x on components in S. For any a E [b] n , we have 

1 - 

E y=K{x) [ Xa (y)} = ^ E E E 0- ~ PrP n - m Xa(x + ScZs ) 

ze[b] n m=0S:\S\=m 
n 

= xa(x)j2 E i l -p) m p n ~ m ^Y,^s) 

m=0 S:\S\=m ze[b] n 
n 

= Xa(x)J2 E (l-p) m p n ~ m ]l{KI=0} 

m=0 S:\S\=m 

n—\a\ , | ,v 

= Xa(*)P H E f" |a| )(l-p) m p n - H - m 
m=0 ^ ' 

= P^Xaix). 

Therefore, 

(TMx) = p lal h*)x:(x). 
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This kind of operator has been used before. See e.g. lfT4"l . 
We are interested in partial sums of the form 



T{i)= E p |a| i/wi 2 > 

a:\aj\ = \I\ 

where / C {1, . . . ,n}. Indeed, those allow to perform the breadth-first search 
algorithm in (5j Theorem 7]. Note first that we get a similar upper bound on the 
weighted Fourier mass of a fixed level of the BFS tree 

E = E E p |a| i/»i 2 

I:\I\=j I:\I\=j a:\ai\=\I\ 



«[a|>j W 7 

s £i/»i 2 £G) 

a:|ol>7 *=J V/ 



:\a\>j t=j 



1-p, 

< max{|/(a:)| 2 V(l-p)^- 1 , 

where we have used Parseval's identity. The rest of the proof of Theorem 7] 
goes without change. The only difference is that now to every / C {1, . . . , n} 
corresponds (b — l)' 1 ' vectors a e [b] n with |aj| = |/| and \ajc\ = 0. But we can 
afford to estimate all of them because |/| = 0(\ogn). Therefore we can find all 
inversely polynomial coefficients of order O(logn). Also, we need to check that 
any UBOX has at least one inversely polynomial coefficient of order 0(log n) and 
that boosting is possible. This is done in lfT2l Section 6]. The only point to note is 
that in the proofs of lfT2l Fact 14, Corollary 17], one can choose the parity \a to 
have all its components outside the variables included in the O (log n) -rectangle 
used in the proof (see also [A, Lemma 18]). 

It only remains to show that the T(/)'s can be estimated in the Noise Sensitiv- 
ity model. As in 0, we consider the distribution overpairs (x, y) G [b] n x [b] n 
which is (x, J\f p (x)) conditioned on the event that at least all components in / are 
updated. This can be simulated in the Noise Sensitivity model by simply picking 
examples (x, y, /(x), f(y), S) until one gets that ICS (which takes polynomial 
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time if |/| = O(logn)). Then note that 

T'(I) = E v(I) [f(x)f(y)} 



E 



v 



c,d 
n-\I\ 



jLEE E Ed-/^) 



m f{c)f{d)xl{x) Xd {x + ^z s ) 



x,z m=0 S:\S\=m+\I\ c,d 

= ^EE E E^-prp^^mhd^wxUzs) 

x,z m=0 S:\S\=m+\I\ c,d 

n-\I\ 

m=0 S:\S\=m+\I\ c,d 
n-\I\ 

= EE E (i-p) m p n - lIhm \Kc)\ 2 i{\c S \ = o} 

c m=0 S:\S\=m+\I\ 

n-\I\-\c\ , . , , ,s 

= E ^ |c| i/»i 2 E f" |J| " ic| )(i-pr^i— h 

c:| Cj |=0 m=0 ^ ' 

= E " |c| i/( c )i 2 > 

c:|c/|=0 

where we have used that if / is real and c + d = mod b, then 

f(d) = (f(c) 
Denote T"(7) = T'(0) - This is 

r"(/)= E p |c| i/( c )i 2 - 

c:|c/|>0 

We want to estimate T(J) which consists of a sum over {a : |a/| = |/|}. We now 
know how to estimate the same sum over {a : \aj\ > 0} for any J. Noting that 
{a : \aj\ > 0} is made precisely of all {a : \a K \ = \K\} with K C J, it is easy 
to see that T(J) can be estimated through the T"(J)'s for J C 7 by inclusion- 
exclusion. Since there are only 2^1 such J's and |/| = O(logn), this can be done 
in polynomial time. The rest of the argument is as in [0 Theorem 11]. □ 
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